With the fast growing of the internet, many electronic services, especially the electronic news services, are growing with each passing day so we could obtain many useful data. In addition to news on the news website, how to manage these data is quite an important work. However, the number of news is growing every day, and the manual classification is time-consuming and laborious. Furthermore, problems with the factor of subjective determination and the difficulties of obtaining knowledge are faced. The general method of managing data is automatic classification. The automatic classification utilizes the machines to learn features of every category document and automatically classify the testing data. The advantages of the automatic classification are that the classification can be completed in short time, and the results are more objective and consistent. Currently, the extensively applied document classifications include word sense disambiguation, information retrieval, information filtering, web classification, computer-assisted reading, etc.
With regard to automatic classification, the traditional learning methods are primary focused on static knowledge. The disadvantages of these methods are that single learning of training models is adopted in every category, large training data are required, and continuous updating of the learned knowledge in every category according to testing data is unachievable. For example, the invention of the Taiwan Patent No. 1249113 lacks dynamic knowledge adjustment (adaptive learning). In addition, the problem of the non-relationship between speculation of the unknown category and the feature of the testing corpus also exists in the current classification system.
In order to overcome the drawbacks in the prior art, a method and system for document classification are provided in the present invention. The particular design in the present invention not only solves the problems described above, but also is easy to be implemented. Thus, the present invention has the utility for the industry.